I was just curious to see which areas in Los Angeles use water the most. So I clicked over to Los Angeles Open Data, and got myself Water & Electricity Usage (2005 - 2013) data.

Let’s take a look.

First, note that the water is measured in Hundred Cubic Feet (HCF). 1 HCF = approx. 748 gallons

library(zoo)
library(plyr)
library(dplyr)
library(tidyr)
library(stringr)
library(zipcode)
library(ggplot2)
library(maps)
library(ggmap)
# load the data that was already downloaded
waterPower <- read.csv("data/Water_and_Electric_Usage_from_2005_-_2013.csv")

# Some clean up

# convert the date to Date format
waterPower$Date <- as.Date(as.yearmon(waterPower$Text.Date, "%b_%Y"))

# Split the zip to zip & geo codes
zipSplit <- ldply(str_split(waterPower$Zip.Code, "\n"))
geoSplit <- ldply(str_split(zipSplit$V2, ","))

geoSplit$V1 <- str_replace_all(geoSplit$V1, "\\(", "")
geoSplit$V1 <- str_replace_all(geoSplit$V1, " ", "")

geoSplit$V2 <- str_replace_all(geoSplit$V2, "\\)", "")
geoSplit$V2 <- str_replace_all(geoSplit$V2, " ", "")

waterPower$ZipCode <- zipSplit$V1
waterPower$Lon <- geoSplit$V1
waterPower$Lat <- geoSplit$V2

# Remove the old variables
waterPower$Zip.Code <- NULL
waterPower$Text.Date <- NULL
waterPower$Value.Date <- NULL

# Reorder the columns just for neatness sake
waterPower <- waterPower[, c(3, 4, 5, 6, 1, 2)]

# Clean up the temporary objects
rm(zipSplit)
rm(geoSplit)

Just the 2011 Data

Some years were missing some data, so I just decided to use only the 2011 data as it has the most complete data. I was also looking to add the population data from 2010 Census, so it works outs nicely since it’s not too far from the year 2010.

# Get just the 2011 data;  Get total by zipcode
waterPower2011 <- waterPower %>%
    filter(Date >= as.Date('2011-01-01')) %>%
    filter(Date < as.Date('2012-01-01')) %>%
    group_by(ZipCode) %>%
    summarise(Water.Use = sum(Water.Use), Power.Use = sum(Power.Use)) %>%
    arrange(desc(Water.Use))

# Load Population info
# Merge with Population table to get population
# The result is the data for just the Los Angeles
population <- read.csv("data/2010+Census+Population+By+Zipcode+(ZCTA).csv")
colnames(population) <- c("ZipCode", "Population")

population$ZipCode <- as.character(population$ZipCode)
waterPower2011 <- waterPower2011 %>%
    inner_join(population, by="ZipCode")

# Then add Latitude, Longitude for each zip code
data(zipcode)

waterPower2011$Lon <- sapply(waterPower2011$ZipCode, function(x) zipcode$longitude[zipcode$zip == x])
waterPower2011$Lat <- sapply(waterPower2011$ZipCode, function(x) zipcode$latitude[zipcode$zip == x])
# Show map of Los Angeles
losAngeles <- get_map(location="Los Angeles", zoom=11)

# Water User Per Month
ggmap(losAngeles) + geom_point(data=waterPower2011, aes(x=Lon, y=Lat, size=Water.Use, alpha=0.5)) + scale_size(range=c(3, 15))
## Warning in loop_apply(n, do.ply): Removed 45 rows containing missing values
## (geom_point).

Top 15 Areas in LA with most water usage

# Get top 15 Zip Codes
# waterPower2011 is already sorted by water.use in descening order, so just need to get top 15
waterTop15 <- head(waterPower2011, 15)
waterTop15$Water.Use.In.Gallons <- waterTop15$Water.Use * 748
waterTop15[, c("ZipCode", "Water.Use", "Water.Use.In.Gallons")]
## Source: local data frame [15 x 3]
## 
##    ZipCode Water.Use Water.Use.In.Gallons
## 1    90071   7782.35            5821197.8
## 2    91350   7627.80            5705594.4
## 3    90265   4611.75            3449589.0
## 4    90067   4218.42            3155378.2
## 5    90211   3794.00            2837912.0
## 6    91201   2486.00            1859528.0
## 7    90017   1845.90            1380733.2
## 8    90010   1456.67            1089589.2
## 9    90012   1196.28             894817.4
## 10   90057   1084.56             811250.9
## 11   90094   1073.38             802888.2
## 12   90502    996.87             745658.8
## 13   90020    980.17             733167.2
## 14   90014    979.05             732329.4
## 15   90744    854.92             639480.2

How about PER PERSON?

# Water Use Per Person (based on the corresponding 2010 census data)
waterPower2011$Water.Use.Per.Person <- signif(waterPower2011$Water.Use / waterPower2011$Population, 4)

# Water Use Per Person Per Month in gallons
waterPower2011$Water.Use.Per.Month.Per.Person.In.Gallons <- signif(((waterPower2011$Water.Use / 12) / waterPower2011$Population) * 748, 4)

waterTop15 <- waterPower2011 %>%
    select(ZipCode, Population, Water.Use.Per.Month.Per.Person.In.Gallons) %>%
    arrange(desc(Water.Use.Per.Month.Per.Person.In.Gallons))
head(waterTop15, 15)
## Source: local data frame [15 x 3]
## 
##    ZipCode Population Water.Use.Per.Month.Per.Person.In.Gallons
## 1    90071         15                                 32340.000
## 2    90067       2424                                   108.500
## 3    90211       8434                                    28.040
## 4    91210        328                                    26.040
## 5    90010       3800                                    23.890
## 6    90265      18116                                    15.870
## 7    91350      33348                                    14.260
## 8    90094       5464                                    12.250
## 9    90014       7005                                     8.712
## 10   90021       3951                                     8.489
## 11   91201      22781                                     6.802
## 12   90058       3223                                     6.117
## 13   90017      23768                                     4.841
## 14   90013      11772                                     3.494
## 15   90502      18010                                     3.450

Say what~~~???

There’s gotta be something wrong with this. The whole thing!

First. For the zip code 90071, the population is only 15 and SO MUCH WATER used!! Where is this??

With the help of some extra data from Los Angeles County GIS Data Portal, I was able to mark the area each zip code covers. Let’s take a look.

library(rgdal)
library(gpclib)
library(maptools)

LAZIP <- readOGR("data/LA_Zip_Shapefiles_Street", "CAMS_ZIPCODE_STREET_SPECIFIC")
zipAreas <- spTransform(LAZIP, CRS("+proj=longlat +datum=WGS84"))

gpclibPermit()  # Set gpclibPermitStatus() to TRUE

# Include the zip code data when building the data frame
zipData <- fortify(model=zipAreas, region="Name")   

# Now all ready to draw the map with the zip code boundary.

Zip 90071

thisZip <- "90071"
# Subset the specified zip codes
thisZipData <- subset(zipData, id == thisZip)
ggmap(get_map(location=thisZip, zoom=15, maptype="roadmap")) + 
    geom_polygon(aes(x=long, y=lat, group=group), fill='grey', size=.4, color='black', data=thisZipData, alpha=0.5) +
    labs(title=thisZip)

It’s part of downtown Los Angeles. It’s mostly commercial. There are businesses like Bank of America, The Westin Bonaventure Hotel, UnionBank, Chase Bank, Citibank, The Standard Hotel, Hilton Hotel, etc. The fact that there are several big hotels here may explain the water usage, however, that’s still a lot of water!

Zip 90017

Now let’s see the area where 23,768 people live and only use 4.841 gallons of water per person per month.

thisZip <- "90017"
# Subset the specified zip codes
thisZipData <- subset(zipData, id == thisZip)
ggmap(get_map(location=thisZip, zoom=15, maptype="roadmap")) + 
    geom_polygon(aes(x=long, y=lat, group=group), fill='grey', size=.4, color='black', data=thisZipData, alpha=0.5) +
    labs(title=thisZip)

Funny how it’s right next to 90071. It covers area that’s a lot bigger and it includes lots of residential areas with big apartment complexes. But how is that they only use 4.84 gallons of water per person PER MONTH?? That’s basically about five flushes of your toilet! So either the population is incorrect or the water usage number is incorrect. Or could it be that vast part of the area is served by a water company other than LA Department of Water and Power? I did a quick web search on that and it seems unlikely.

Since we’re on the topic of water usage PER PERSON, let’s take some guesses to see what would be realistic.

# Let me make some real rough guesses

# Toilet Flush: 1 in the morning; 1 at night; 1 flush = 1 gallon;
flushes <- 2 * 1

# Shower: 1 shower per day; 10 minute shower; 1.5 gallons per minute for typical shower heads;
shower <- 1 * (10 * 1.5)

# Brush teeth; Wash hands;  Let's just say 1 gallon per day;
handwash <- 1

# Prepare food; Do dishes: Kitchen faucet = 2 gallons per minute; On and off usage totaling about 15 minutes;
dishwash <- 2 * 15

# Laundry: 30 gallons per load;  About 3 loads per month;
laundry <- (30 * 3) / 30    # Divided by 30 to get daily average;

# Misc: 0.5 gallons per day;  Whatever else there is like watering plants, etc.
misc <- 0.5

# Let's not include anything outdoor like lawns, washing driveway, etc. for simplicity.
    
# 30 days in a month
(flushes + shower + handwash + dishwash + laundry + misc ) * 30
## [1] 1545

That’s 1,545 gallons per person! Obviously this is a really rough guess, but probably not too far off. So it’s safe to say per person water usage should be at least about 1,000 gallons per month.

Conclusion

Let’s try removing that one outlier and see how the PER person usage is throughout the city of Los Angeles:

# Let's remove that one outlier to get a better sense of the overall usage
waterPower2011 <- subset(waterPower2011, ZipCode != "90071")

ggmap(losAngeles) + geom_point(data=waterPower2011, aes(x=Lon, y=Lat, size=Water.Use.Per.Month.Per.Person.In.Gallons, alpha=0.5)) + scale_size(range=c(3,15))

But then again, it’s not too helpful knowing that the data needs some various cross checks. The population data could be wrong as the zip code approximately assigned to whatever that could be made out from the parcel data. It could also be possible that LADWP only released parts of measurements.

Regardless, this was done just for fun and not for some serious research. I dealt with the missing values by just removing them all together as it was ok to do for this particular purpose.

Bottomline is… let’s try our best to save water.